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Abstract 

A generalization of the polar coding scheme is proposed. It exploits several homogeneous kernels over 
£— v | alphabets of different sizes. An analysis of the introduced scheme is undertaken. Specifically, asymptotic 

^sj , properties of the polarization are shown to be strongly related to the ones of the constituent kernels. 
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1 Introduction 



Polar codes were introduced by Arikan in [T] and provided a scheme for achieving the symmetric capacity of 
binary memoryless channels (B-MC) with polynomial encoding and decoding complexity. Originally, Arikan 
considered a simple binary and linear 2 dimensional kernel, which is based on the (u + v,v) mapping. This 
mapping is extended to support an arbitrary code length N — 2™, by a Kronccker power of the generating 
matrix that defines the transformation. Multiplying a permutation of an input vector u by this matrix results 
in a vector x, that is transmitted through N independent copies of a memoryless channel, W. As a result, N 
dependent channels between the components of u and the outputs of the copies of the channel W are created. 
^>. ■ These channels exhibit polarization under successive cancelation (SC) decoding: as n grows there is a proportion 

of I{W) (the symmetric channel capacity) of the channels that have their capacity approaching 1, while the rest 
of the channels have their capacity approaching 0. 

The exponent of the kernel as a measure of the asymptotic rate of polarization for arbitrary polar codes 
was introduced in [5] and generalizations were given in [3]. In [3], the authors suggested designing binary 
kernels based on the idea of code decomposition. However, the more natural way is to take advantage of the 
explicit (non-binary) code decomposition in order to design a kernel. This however, usually requires introducing 
additional kernels in respect to the initial binary kernel, which results in a mixed kernel structure. Our objective 
in this paper is to explore such constructions and analyze them. 

This paper is organized as follows. In Section f2J we review the idea of code decomposition and its relation 
to the design of polar code kernels. This notion is the motivation for the introduction of mixed kernels. For 
simplicity, we decided to present the concept of mixed kernels by an example of a specific construction that is 
composed of a binary kernel and a quaternary kernel. This is done in Section [3J General mixed kernels are 
considered in Section [4] Simulations results and conclusions are given in Section [5] 

Throughout we use the following notations. For i > j, let u* = (uj, ...,Uj) be the sub- vector of a vector u of 
length i — j + 1 (if i < j we say that u* = (), the empty vector, and its length is 0). For a natural number n, 
we denote [n] — {1, 2, 3, ..., n}. 

2 Preliminaries 

In [3] , we explored the idea of code decomposition and its relation to the construction of binary kernels for polar 
codes. We review these ideas here. 

Definition 1 The set {T\, ...,T m } is called a decomposition of {0,1} , i/lf ={0,1} , andT^ x is partitioned 

into mi equally sized sets \ X", , j > , of size -=-^ (i € \m— 1]). We denote the set of sub- codes 

I l+1 ) b,=o.i,...,m t -i n;=i^ ( L J/ 

of level number i by 

Ti = {lf rI) |&j G {0,1,2,...,™, -I},j6[i-1]}. 



The partition is usually described by the following chain of codes parameters 

(rii,ki,di) - {n 2 ,k 2 ,d 2 ) - ... - (n m ,k m ,d m ), 

if for each T G T$ we have that T is a code of length rii, size 2 ki and minimum distance at least di. 

A transformation g(-) can be associated to a code decomposition in the following way. 

Definition 2 Let {Ti, ...,T m } be a code decomposition of {0,1} as in Definition^ and such that V7~ G 
T m , |T| = 1. 

The transformation 

rn m 

g(v l7 v 2 ,...,v m ):Y[{0,l} nH ^{0,lY ;^m, = l (1) 

i=l i=l 

induced by this binary code decomposition is defined as follows for v G Yii=i {®i 1} *• 

g(v?)=x[ if4<=T% T \ (2) 

where in the notation of T m 1 we take the decimal representation of the components of v, for consistency with 
Definition^ Sometimes, it is useful to denote the argument to g(-) as the vector u G {0, 1} , i.e. write g(u) 
instead of <?(v) where v G ]^[ i=1 {0, 1} *. In this case, there exists the obvious correspondence between v and 

u, that is Vi = u 3 Z^i_± i G [to]. We say that Vi is representing mi bits that are "glued" together. It is 

l + 2^j = 1 rrij 

convenient to denote Vi as u s j, if Vi = u{. 

In [U Example 1], we considered the decomposition into cosets described by the chain (4,4,1) — (4,3,2) — 
(4, 1,4). Using Definition [5J we introduce a kernel function 

ffi(ui,U2,3,U4) : {0, 1} x {0, l} 2 x {0, 1} -*■ {0, l} 4 (3) 

that is induced by this decomposition. The first bit u\ chooses between the sub-codes T% and T 2 ■ The 
second and the third bits are glued together, forming a binary pair, or quaternary symbol 1*2,3 and they choose 

the correct sub-code of T 2 . Finally, 1*4 selects the code- word from the chosen sub-code. Note, that an easy 
implementation of the encoding is to multiply u by the proper generating matrix. Indeed, there's nothing new 
in this construction. The challenge is to extend this mapping to an N — 4™ length mapping. The standard 
Arikan's construction (based on the Kronecker power) does not suffice, because of the glued bits 112,3, that need 
to be jointly treated as a quaternary symbol. To facilitate this, we suggest introducing a second quaternary 
kernel, g 2 (-). Because different coordinates of the input of <?i(-) are from different alphabet sizes, and because 
in order to implement this polarization scheme, we incorporate two mapping functions gi(-) and g 2 (•), we refer 
to the overall construction as a mixed kernel construction. Details on how to combine kernels gi(-) and g 2 {-) to 
a mixed kernel are given in Section [3J 

3 Mixed Kernels by an Example 

We begin by describing a construction of a mixed kernel by several homogeneous kernels over different alpha- 
bets. To have a comprehensive presentation of this subject, we decided to focus on a specific construction. 
Generalizations easily follow from this example and are given in Section 01 

3.1 Construction of a Mixed Kernel 

Let gi(-) be the mapping defined in ([3]). Let g 2 (-) : ({0, l} 2 ) — > ({0, l} 2 ) be a polarizing kernel over the 
quaternary alphabet. For example, g 2 {-) can be a kernel, based on the extended Reed-Solomon code of length 
4, Gfls(4) that was shown in [3l Example 20] to be a polarizing kernel. Using g 2 (-), we can extend the mapping 
of <7i(-) to an N = 4" length mapping. Note that g 2 {-) is introduced in order to handle the glued bits 1*2,3 in the 
input of gi(-). 



Let us first review the channel splitting principle using <?i(-). The output of gi(-) is binary and so is the 
channel on which the result of the transformation is sent on. The meaning of taking two inputs and glue them 
together is that we want these inputs to be treated as one unit for decision making and decoding. Assume a 
binary vector u was transformed to x by gi(-). 

0l(wl,«2,3,W4) = Xi ui,u 4 e{0, 1}, 

u y 6{0,l} 2 ,i,e{0,i),ie [4] 

xf is transmitted over 4 copies of the binary memoryless channel W, and we receive the output vector y. The 
channel splitting principle dictates the following channels. 

WfVlKH E ^W4(y?|«l,U2,3,U4) 

«2, 3 e{o,i} 2 ,«4G{oa} 
wi 2 ' 3) (yi,ui|u2,3) = Yl ^ W/ 4(yi|ui,W 2 ,3,W4) 

« 4 G{0,1} 
W4 ) (yi,Wl,U2,3|«4) = 23W4(yi|«l,W2,3,W4)- 

Next, consider 02('), which is a quaternary input and output mapping. A binary vector u G {0,1} 8 is 
transformed into x G {{0, l} 2 } in the following fashion. 

02(«1,2, «3,4, "5,6, «7,8) = *i «2i-l,2j, »j G {0, l} 2 % G [4] 

xf is transmitted over 4 copies of a quaternary input memoryless channel VV, and the output vector y is 
received. By the channel splitting principle we get the following channels for i G [4]. 

r (2i-l,2i) , 



, (y.Ui |U2i-l,2i) 



2 75^4(y|uf 2 ,u 2 ,-i, 2j ,u^ +1 ) 



u! 4+1 €{0,l}» 

We denote 5 ( - 1 '(-) = 0i( - )- Constructing a mapping function of dimension 16, denoted by g^ 2 '(-), is done as 
follows. Let u be a binary vector of length 16. Define 0i(ui, 112,3,114) = a, 52(^5,65^7,8, U9,i07 w ii. 12) = b and 
5i(«i3,wi4,i5,wi6) = c. Finally, 

3 (2) (u) = [.gi(ai,6i,ci),5i(a 2 ,6 2 ,c 2 ), (4) 

01 (03,63,03), 01 (a4,&4,C4)]. 

In order to extend this construction to a general kernel 0' '(u| ) in which some of the inputs are glued, we 
suggest the following recursive algorithm. 



Mixed Kernel Construction Algorithm 

STEP 1: Take 4 parallel copies of 0^ s_1 -'(-)> and allocate binary inputs (that some of them will be glued) 
by Ui,u 2 , ...U4/8. Denote the direct inputs to g^ '(■) by vf . Since we simultaneously deal with all the 4 
inputs to the copies of g^ -1 -* (•) having the same index, there's no need to denote them separately. Our goal is 
to allocate inputs of the set u\, w 2 , ■ ■■u i k, glue them together if necessary, perform the proper transformation 
(i.e. 0i(-) or 2 (-)) and connect the outputs of these transformations to the inputs of g^ k ~ x \-)- 

Initialize two counters: i •<— 1 for the inputs of 0*- fc_1 - ) (-) and j <— 1 for the inputs of g^(-). 

STEP 2: Consider input m of 0^ fe_1 -'(-). There are two cases here, (a) Vi is single (i.e. it is not glued 
to the next input). Allocate inputs Uj,Uj+i,Uj+2, %+3i use them as inputs to 0i(-), and use the outputs of 
the transformation as inputs to the ordered copies of Vi in the inputs of the copies of <?( fc_1 )(-). Note: Uj+i 
and Uj+2 are now glued together (this symbol is denoted by Uj+ij+2) and the other inputs are binary. Set 



i 4— i + 1, j <— j + 4. (b) Vi is glued to v%+i (i.e. we have a quaternary symbol Vi,i+i). Allocate inputs 
vZ , glue them in pairs (i.e. Ujj+i, Uj+2,j+s,Uj+4,j+5,Uj+{i,j+7), use the four pairs as inputs to <?2(-); an d 
take the outputs of the transformation as inputs to the ordered copies of Vij+i in the inputs of the copies 
ofg( k -V(-).Seki<-i + 2,j<-j + 8. 

If you hnished allocating the inputs of g^(-) then stop, otherwise repeat STEP 2. 



Note that the algorithm is consistent with the definition of g( 2 '(-) in (j4]). The construction supports Arikan's 
analysis by the channel tree-process as we see in Section 15721 Also note, that by this construction, successive can- 
celation decoding of the inputs tojWQ is actually decoding of inputs to the transformations gi(-) or g 2 {-) that 
use as a channel one of the synthesized channels generated by g^ k ~ 1 ^{-) over W. In other words, when decoding 
one bit m (two glued bits Ui,i+i) over the channel W$ (y, u| _1 |ui) (over the channel W^ % (y, Ui _1 |uj,i+i))) 
this is manifested as decoding a bit (a pair of two glued bits) which is an input to the transformations gi(-) or 

<?2( - )- These transformations use as a channel the proper synthesized channel ( W^ k -i or W^-i , depending 
on i). 

3.2 The Tree Process 

We now turn to describe the channel tree process corresponding to this mixed kernel construction. A random 

{, , .,, i v(n) 
W4I™ > , where v(n) denotes the number of channels (where 

the glued channels are counted as one), and T n (i) denotes the index of channel number i (r n (i) is needed because 
some of the channels correspond to glued bits and therefore have their indexing as a pair of integer numbers). 
For example, for the W\q channel, constructed using the transformation in (j4|), we have the number of channels 
1/(2) = 10, where the values of r 2 (-) are [1, (2, 3), 4, (5,6), (7, 8), (9, 10), (11, 12), 13, (14, 15), 16]. We also denote 
by {A„} n>0 the number of bits at the input of the channel, which in our case is N n = 1 when we deal with a 
single bit channel or N n = 2 when we deal with a channel of glued bits. We have the following definition of the 
channel processes. 

W n+1 = W { n Bn) for n > ; W = W,N Q = 1. 

The probabilistic dynamics of {B n } n>0 , {N n } n>0 need to be described. Let < B n > be an i.i.d random 

■— — I J n>0 

sequence of the values [1,(2, 3), 4] with corresponding probabilities [0.25,0.5,0.25], and let < Bn > be an 

I J n>0 

i.i.d random sequence of the values [(1, 2), (3, 4), (5, 6), (7, 8)] with uniform probabilities. Denote by the random 
variable T the minimum non- negative n such that B n = (2, 3), and set 

N _ f 1, n<T- 
n ~\ 2, n> T. 

Finally, set B n — B n . Note that T is a geometric random variable with probability of success p — 0.5. 
Furthermore, given the value of T the sequence of B n is of independent samples (although the distribution is not 
identical for all samples). Note also, that the pairs of numbers in the sequence of B n indicate channels having 
inputs of two glued bits. 

Suppose we have a certain channel W and binary i.i.d input vector Uf that is transformed by gi(-) to Xf, 
transmitted over a B-MC channel, and received as Y±, we have 

4J(W) = /(n 4 ; C/ 4 ) = I{Yt, Ux) + I{Yt, U^\U 1 )+ (5) 

+I(Y 1 \ U 4 \Ul) = I(W {1) ) + 7(W< 2,3 >) + I(W (4) ). 
Next, define the information random sequence corresponding to the channels as {In} n > - 

In = —r T — n > 0. (6) 



The Bhattacharyya parameter sequence is denoted by Z n = Z{W n ), where for a g-ary channel W we have 
Z(W) — q.(q_i) J2x x'ex 2 x=£x' ^x,x' (W). Here, X is the alphabet of the channel, and 

Zx,x'(W) = J2 VW (Y = y\X = x) W(Y = y\X = x>). 

y&y 

The maximum and the minimum of the Bhattacharyya parameters between two symbols are defined as 
^max(>V) = ma,x X7X , eX ^ x , Z X _ X ,(W), and Z min (W) = m\n x , x , eX Z X/X ,(W). Observe that if \X\ = 2, then 
Z max (W) = Z min (W) = Z(W). 

Note that /„ € [0, 1], and so is also Z n . By using [5J Proposition 3], it can be shown that Z n — > 1 ^==^ I n — > 0, 
and that Z n — >• <^=> J n — >■ 1. 

Proposition 1 The process {I n } n >o *s a bounded martingale which is uniformly integrable. As a result, it 
converges almost surely to 1^ . 

Proof By the definition of the information sequence (JBJ we have 

Using ([5]) we have that 

E[I n+1 \I n ,N n = l] = 
= \ (i (W^) + I (w™) + I (w^)) = ^ = /„ (8) 



1 J 
E[I n+1 \I n ,N n = l] = - ± i ' +^^-^ v ., ' (7) 



On the other hand 



E[I n+l \I n ,N n = 2] = 

i/(w^) !/(wi M) ) !/(wJ w) ) !/(w; 



(7,8) 



(9) 



which is 



E [J B+1 |J B> JV„ = 2] = ~ • 1 (/ (w^< 2 >) + / (w^) + (10) 

+1 (WW) + J (^«)) = ^ = J n 

So, by taking flSJ and (fT0|) we have 

E[I n+1 \I n ] = I n , (11) 

which means that the sequence {I n } n >o is a martingale. Furthermore, it is uniformly integrable (see for example, 
[51 Theorem 4.5.3]) and therefore it converges almost surely to Too. 

Note that 

Pr (I n eS) = ± J2 #W«)), (12) 

i£[i/(n)] s.t. /(W^" (i)) )GS 

where # (tVi(«)) counts the number of bits at the input of channel T n (i), which is 1 for a single bit channel, and 
2 for a glued 2 bits channel. A similar expression to (fl2|) can be stated for the process Z n . This probabilistic 
method gives the two bits of the glued bits pair, the same behavior in terms of probability of decoding error and 
mutual information, and as such they are counted. Note that E [I n ] = E[Joo] = I(W). Thus, by showing that 
the mixed kernel is polarizing, i.e. 1^ € {0, 1}, we may infer that the proportion of clean channels (created by 
the transformation and successive cancelation decoding) is I(W) by IT2|) . 

Also note that for g^() we can easily count the number of glued 2 bits input channels (denoted here by 
In) as 7„ = 4™ • \ ■ Pr(7V n = 2) = 4j- • (l — ^r) . The proportion of the glued 2 bits channel goes to 1 as n grows, 



and so is the number of uses of 32(0 kernel. Because of this, the properties of 32G) dominate the construction 
asymptotically. Specifically, we show in the sequel, that if the kernel 32(0 is polarizing, so is the mixed kernel 
construction we propose. Moreover, if the kernel 32(0 has a lower bound and an upper bound on the exponent, 
^1(32) and £^2(52) respectively, then £1(32) and £2(32) serve also as a lower bound and an upper bound on the 
rate of polarization of the mixed configuration. 

3.3 Polarization and Polarization Rate 

In this part, we study the polarization property of the mixed kernel and its rate of polarization. We show that 
32(')' s properties determine the asymptotic mixed kernel properties. 

Proposition 2 Assume that 32G) is a polarizing kernel, i.e. for a construction that is based only on 32(0 we 
have that 

lim Pr (7 (w n ) /2 € (8, 1 - 5)) =0, W € (0, 0.5) (13) 



As a result, the mixed kernel construction is also polarizing, i.e. 

lim Pr (I n G(S,1- d)) = 0, W e (0, 0.5) (14) 

n— >oo 

Proof We prove that for a given 6 for each e > there exists an no = no(S, e), such that for all n > no 

Pr(I n e(5,l-5))<e. 

Let n\ be chosen such that Pr (N n = 2) > 1 — | for each n > n\. Now, for n — n\ consider all the glued bits 

channels VV^nf . When n grows further, each one of them undergoes polarization, that is each one of the j ni 
glued channels has an index 712(1, j) such that when n > n\ + n^ 



Pr [I(W n )/2 E(5,l- S)\W ni = W^{>) < -. 
Denote by n^ the maximum over these ri2(i,j), and by no — n i + n \- We have that for n > no 

Pr(/ n e(M -<*)) = 

= Pr (/„ G (6, 1 - S)\N n = 1) Pr (N n = 1) + Pr (/„ e (5, 1 - tf)|JV„ = 2) Pr (JV„ = 2) < e. (15) 

V v ' V v ' S v '" v ' 

<1 <£/2 <£/2 <1 

We now turn to discuss the polarization rate. To do this, we need to consider the partial distances of the 
kernels. We use the notations of [3J. For a given kernel g{v\, t>2, ■ • ■ , v m ) as defined in (TTJ), we give the following 
definitions. 

<l(vi- X )= m min m d H {g^\~\x^T + i),9{<\^^T + i)) 

w i+ i. w i+i 

D^^min^^vr 1 ) .,.'€{0,1}- 

x,x'£{Q,l} m i x,x'£{0,l} m i ,x^£x' 

In order to distinguish between the partial distances of the two kernels, 3i(-) and gi{'), we add an additional 

(z) (i) fh 

subscript to these parameters to indicate the kernel. For example, D\ min and D^ min denote the i minimum 
partial distance of kernel 3i(-) and kernel 32(0 respectively. We note here that for the binary kernels, we have 
D (i) _ D (i) 

i^max — ^ m i n - 

Proposition 3 There exist positive constants ci,C2 such that 

Z m ax(W n+1 ) < Cl • Z max {W n )^ (16) 



Z mm {W n+1 ) > c 2 • Z mm (W n ) D " n > 0, (17) 

where Z max (W n ), Z m i n (W n ) are the maximum and the minimum Bhattacharyya parameters of the channel W n . 
\ D n > , < D n > sequences are defined as follows 

I J n>0 I J n>0 

D - D f(Bn) D ~D nBn) 

^n — LJ t,mvn, J - y ri ~~ 1 - / t,max ) 

where the parameter t, that indicates the kernel to which the partial distances refer to, equals 1, if N n = I, 
and otherwise equals 2. f (•) maps between the names of the channels and their ordinal numbers. For example, 
for t = 1, it gives f (1) = 1, f((2, 3)) = 2 and f (4) = 3. 

Proof First, we note that for the quaternary input channel we have that [3J Corollary 18] 

Z max (>¥ f «>) < 4 4 -*Z max (w) D2 ' m '° * e [4] (18) 

^-z min (w) D2 ' max < Z min (w«*») , t e [4], (19) 

where f(i) = (2i - 1,2*), Z min (W) = mm x ^ x , eX , x ^ x , Z XjX >(W), and Z max (W) = max^^-g^ ^^/(W). Note that 
if \X\ = 2, then Z max {W) = Z m { n (W) = Z(W). Also note that for the binary kernel we have from [31 Corollary 

18] 

lz(Wf^ < Z (>V«) < 2 3 Z(W) D ^ 

-jZ(W) D i 3 ™ < Z (w (4) ) < Z(W) D ^ (20) 

Note that here because we deal with binary inputs m, m to g\{-) we have D\ ' min = D\l nBX and D\ ^ in = D\ ^ ax . 
Also note, that the difference between the indices of the channels and the distance parameter in (|20)) is because 
of the glued bits w 2 ,3- 

We now need to consider the glued bits channel, which is the result of 1*2,3, the second input of gi(-). To do so 

we take similar derivations to [3j Lemma 21]. We assume that the input to this channel is quaternary, although 

(23) 
the direct input for the channel is binary. Denote by Wui the channel assuming that u\ was transmitted. We 

have 

z^ (w&*>) = £ Jw™(y\*)w™(y\x>) = 

y 

= \zZJ E W(ylffi(«i.*.«4)) E W{y\ gi { Ul ,x',v A ))< 
y |( 1.46(0,1) « 4 e{o,i} 



= 5 E E (e^^w 

u 4 e{o,i} u 4 e{o,i} \ y 



ui,x,M 4 ))VK(yiffi(ui,a; / ,W4)) < 



I r->( 2 ) 

< i.4.z(W) D i^in. 



On the other hand, 



Z, 



,x< (W&">) 



y (I «46{o,i) « 4 e{o,i} 

- o ma ? , ^ E \/ w/ (y|9i( u i> x > W4))W(y|ffi(ui, a;', u 4 )) 
2 u 4 ,u 4 e{o,i> I z -^ 

> iz(w)^. 

Therefore, taking ci = 4 3 and c 2 = 4~ 6 gives (pS} and ([T7). 

Proposition [3] enables us to derive the asymptotic rate of polarization in the way that was done in [2] and 



Proposition 4 If g%(-) is a polarizing kernel and Z(W) ^ then 



lim Pr ( Z. n < 2 

n — >oo 



(z n < 2- 4 "") = I(W), V/3 < ^(32). (21) 



lim Pr (Z n > 2-^) = 1, V/3 > £2(32), (22) 

where Eh{g 2 ) = 1/4 E^i 1o S4 (^Ln) aW ^2(52) = 1/4 EiLi 1o S4 (^Laz) ■ ^1(52) and E 2 (g 2 ) are the lower 
bound and the upper bound (respectively) on the exponent of the kernel gi{-)- 

Proof Idea Taking the path of [7j Section E] enables us to prove (|2~Tj) . using the following two observations, 
(a) linin^oo n^ 1 E"=i l°g l-^*) = ^1(32) almost surely, (b) Conditioning on the value of the random variable 
T, the sequence I D n > is of independent samples. Adjusting the proof in [8l Section III] while using the fact 

I J n>l 

that, almost surely, lim„_ ! . 00 n™ 1 E™=i 1°§ ( **) = ^2(32) results in (f2"2"j). For details, see the Appendix. <> 

4 General Mixed Kernels 

The analysis that was done in Section [3] was for a specific dimension £ = 4 and alphabet sizes 2 and 4. This 
technique can be generalized quite easily to general mixed kernel schemes. Let g\(vi,v 2 , ■■■ 1 v m ) be equal to g(-) 
in ([1]). Denote the set of indices of the glued bits by B = {i 6 [m]|mj > 2}. For each i £ B we supply a kernel 

gi+i ■ ({0, l}'™ 1 ) — > ({0, 1} *) (by convention, if m, = rrij we usually take 5i+i(-) = gj+i(-)). We note that in 
[H Table 5], the author gives a list of code decompositions that can be used for the definition of gi(-). For the 
auxiliary kernels gi+\ (•), i £ B one can use non-binary kernels from [10 . 

The construction of larger dimension transform, g( ' lu x J , can be done by a proper adjustment of the 

algorithm we suggested in Section[3l using the auxiliary kernels gi(-) i £ B for the glued bits inputs of <^ fe-1 H')- 
A tree process, can also be defined in a similar fashion to Section 13.21 The probabilities for the choice of 
descendent channels for the first kernel are ^f- i £ [to], and the probabilities for the channels induced by the 
kernels gi{-) for i S B are uniform. The random variable T indicates the transition from the initial kernel gi(-) 
to one of the g-ary kernels, where q > 2. Finally, the information sequence /„ = ^ N is a bounded martingale. 
Generalizing Section 13.31 we are able to show that 

Proposition 5 Assume that for all i e B, 5i+i(-) is a polarizing kernel, i.e. for a construction that is based 
only on gi+\(-) we have that 

lim Pr (i (Wn\ /m, e(6,l- 8)\ =0, V<5 £ (0, 0.5). 

As a result, the mixed kernel construction is also polarizing, i.e. 

lim Pr (J„ 6 (5, 1 - 5)) =0, V5 E (0, 0.5). 

71— ^OO 

Let £1(3) = minjge E x (g i+ i) and E 2 (g) = max ieB E 2 (g l+ i), where Ex (g l+1 ) and E 2 (g i+ i) are, respectively, 
the lower-bound and the upper-bound on the exponent assuming that we use only the kernel gi+\. 

Proposition 6 If for all i S B, gi+i{-) is a polarizing kernel and Z(W) ^ 0, then 

lim Pr (Z n < 2- l " n ) = 7(W), $ < E x {g) (23) 



lim Pr [Z n >2- e =1, !3>E 2 (g). (24) 

See the Appendix for details of the proof of Proposition O 
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Figure 1: Upper bounds on the block error probability versus rate for three polar codes structures and SC 
decoding at block length 2 14 bits on the BEC with erasure probability 0.5. 



5 Simulations and Concluding Remarks 

Proposition [6] implies that when considering the exponent as a measure of the polarization rate, the behavior of 
the mixed kernel is the same as the weakest from the auxiliary kernels. However, the exponent is an asymptotic 
measure and it may fail capturing the performance of a polar coding scheme for a finite block length N. 

In Figure [1] we give results of density evolution computation over the Binary Erasure Channel (BEC) with 
erasure probability 0.5. Three polar codes with the same block length of 2 14 bits are considered: (u + v,v) is 
Arikan's binary polar code [lj, RS(4) is the extended Reed-Solomon construction considered in [3j Example 20] 
and Mixed-A, is the mixed kernel example from Section [3] (for the second kernel, g2('), we used RS(4)). To allow 
the RS(4) scheme to have the same length of the other schemes, we took two RS(4) transformations of length 2 13 
bits and applied on their outputs the quaternary (u + v, v) transformation. The curves represent upper-bounds 
on the block error probability versus rate under SC decoding. The upper-bound here is a summation of the error 
probabilities of the split channels corresponding to the information set of the code. The information set for each 
curve point was determined using the technique of [U Section V.D]. The figure demonstrates an advantage of 
the mixed kernel code in respect to the other candidates. We note, that as the theory predicts, the gap between 
the Mixed-A and RS(4) curves decreases for codes of lengths 4™ bits as n grows. 

We further note, that the mixed kernel scheme has an advantage over the RS(4) in terms of decoding 
complexity. Computing the marginal probabilities of g2(;) requires much more multiplications and summations 
in comparisons to the gi(-) kernel. Therefore, decoding of a code based on the Mixed-4 kernel requires less 
multiplications and summations than would have been required for decoding a code of the same length based 
on the RS(4) kernel. 
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A Appendix 

A.l Proof of Proposition 3] 

We begin by proving the following auxiliary lemma. 
Lemma 1 Let {Y n } n >o be a bounded sequence defined by 

f F„ (1) , n<T; 
[ Y„ > , n>T. 

where <Yn > and \Yn \ are i.i.d sequences, T is a r.v. with the property 

I ) n>0 I J n>0 

lim Pr(T >t) = 0, (25) 

t— >oo 

and T is statistically independent of the sequence Y r \ . Then, 

1 " r 1 

lim — y Yi = E Y^ almost surely. 

Proof Using the strong law of large numbers we know that 

Pr ( w £ fi; lim rT 1 fr^u) = /i 2 ) = 1. (26) 

\ i=l / 

Let u £ f2 be such that liuin^^, n^ 1 X)"=i ^" ( CJ ) = M2- Assume that T = t, then 

1 n i * + i n 

££*(«) = i £r/ 1) M + — J -i E ^M' ( 27 ) 

n*-^ n^— 1 n n — t *— ' 

i=\ z=l i—t+1 
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Now, i J2i=i Yi ( w ) surely goes to zero as n grows, and ^^ J2i=t U ( w ) S oes to M2 because of the choice of 
uj (remember that T and the sequence Yn are independent). This means that 

n 

lim n _i yX(u;) = ^2- (28) 

n— >oo * — ' 

Therefore, 

<Lefi; lim n- 1 Vy ? ( 2 )(^)=/i2i C \ u £ O; lim n' 1 V y n (w) = ^2 1 , 

I i=l J I i=l J 

SO 

PrwGO; lim n" 1 V YJu) = p 2 ) > Pr | w € Q; lim n" 1 V F„ (2) (w) = jU 2 | = 1. 

\ i=l / \ i=l / 



A. 1.1 Proof of dH]) 

To prove (|2"Tj) we make the following adjustments to the proofs of Mori and Tanaka in [3J Proposition 15]. We 
attempted to give a comprehensive version of their proofs at Subsection IA.3I f [5] doesn't contain proofs for this 
statement). There are two parts of the proof. First we should prove that for an arbitrary fixed p G (0, 1), 

lim Pr(Z„<p") = Pr(Z oo = 0). 

n— >oo 

The key method in the proof is using the law of large numbers, on the sequence D n . This is applicable here 
because of Lemma Q] This leads to the observation that for any < j3 < 1/4 J2i=i 1°S4 ( -^2 min ) > f° r * ne P r0 P er 
choice of p, 7 and large enough n (see Subsection IA.3I below for the definitions of D n (/?), Gm,n (7) an d C m (p)) 

Pr (D n (fi)) > Pr (g an , n {-/)f]C an {p) 



In the original proof, the next step is to claim that the two events at the right side of the equation are independent 
which results in (I5B1 . But this doesn't apply here, because the sequence < D n > is not of independent samples. 
However, conditioning on the event T < a ■ n the two events are independent. We have that 

Pr(A,03))> 

Pr (G a n,n (7) \T < an) • Pr {C an (p)\T < an) Pr (T < an) . (29) 

This leads us to 

lim Pi{V n (/3)) > 

n— J- 00 

lim Pr (G an>n (7) \T < an) ■ Pr (C an (p)\T < an) Pr (T < an) = 

n—t-oa \ j v. j 

-»-l ^Pr(Z oo ^0) 

= Pr(Zoo - 0) 



A.1.2 Proof of ([22]) 

We take the path of the proof of the converse part of [8j Theorem 3]. We consider the Z m - m (W n ) sequence and 
define the sequence < Z n > in the following way 

I J n>0 

Z a = Z min (W) = Z{W) 
11 



Z n +i = c 2 • Z° n n > 0. 
Note that Z m i n (W n ) > Z n , and therefore 

Pr (Z n > S n ) > Pr (Z min (W„) > S n ) > Pr (z n > <5„) . (30) 

By the definition of Z„ we have that, 

Z n = (c 2 ) n Z^° bi (31) 

Assume that C2 < 1, otherwise we can replace it by Cz < C2 such that C3 < 1. 

ro-l 

log 2 (z„) = 71 log 2 (c 2 ) + log 2 (z ) J] A, (32) 

For large enough n we have 



i=o 

ra-1 



log; (-l0g'2 (^n) y 

= log, f 71 log 2 (c^ 1 ) + log 2 (4 _1 ) • II A J < (33) 

log, (nlog 2 (c,- 1 )) + log, flog 2 (4- 1 ) • f[ A J - 

/ n-1 

» °( 1 ) + -E 1o s4a 



This results in 



71 

i=0 



Pr ( Z n > 6 n ) > 

n— 1 



Pr W + ^ E lo ^ (^) < ^ log^ (" l«g 2 (*»)) • (34) 



Now, set £ = A, 8 n = 2"^" and /3 > 1/4 £)£=i log 4 (A^Lax) in ©• We have > b Y Lemma [TJ that 
7T ElTo 1 lo Se (A) goes almost surely to 1/4 £ - = i lo g4 (A^Lax) > therefore 



lim Pr ( (1) + - V log, (a) < I log, log 2 (-<5„) ) = 1. 
\ i=o / 



<> 



A. 2 Proof of Proposition [6] 

First, a generalization of Proposition [3] can be stated. 
Proposition 7 There exist positive constants ci,c 2 such that 

Z max (W n+1 ) < ci • Z ma:c (W„)^" (35) 

2™ n (W„+i) > c 2 • Z™„(W„) 5 " 71 > 0, (36) 

where Z max (W n ), Z m i n (W n ) are the maximum and the minimum Bhattacharyya parameters of the channel W n . 
{ D n \ , s D n \ sequences are defined as follows 

I ) n>0 I J n>0 

D -D f(Bn) D -D f{Bn) 

where the parameter t indicates the kernel to which the partial distances refer to, i.e. t G B[J {1}. t equals 
1, if N n = 1, and otherwise it equals j, if gj(-) is the auxiliary kernel for the alphabet of size 2 ™. f(-) maps 
between the names of the channels and their ordinal numbers. 

12 



Using Proposition [JJ we can prove Proposition [5] in a similar fashion to the proofs of Proposition 21 but 
instead of using Lemma [l] we use Lemma [2] below. Specifically, for proving (|24[) we take the same steps we used 
in the proof of (|2~Tj) . but use the fact that lim n _j. 00 n _1 XaLi ^°Si D% > <^i ' -®i(sO for each 8\ E (0,1), almost 
surely, by Lemma [21 For proving (f23|) we take same steps we used in the proof of (j22|) , but use the fact that 
linin^oo rT 1 X)"=i l°Sf A < (1 + <fe) • ^2(3), for each J2 £ (0, 1), almost surely, by Lemma [2] 

Lemma 2 Let J be a random variable having values from a finite set of numbers J (1 £ J ). Let {Y n } n >o be 
a bounded sequence defined by 

V _f Yn\ n<T; 
"~\ Y,i J \ n>T. 

where <Yn > and < Y„ > j € J are i.i.d sequences, T is a r.v. with the property 

I 1 n>0 I i n>0 

lim Pr(T > t) = 0, (37) 



l n 



(i) 
and T and J are statistically independent of the sequences Yn j € 3 ■ Let jj,j = E 

fJ-min = Him flj fJ-max = max flj 

Then, almost surely, i X)"=i -^ converges to a number /i where fi E {fJ,j\j E J'}. Specifically, this means that, 
V(5i, ^2 E (0, 1), as 71 — > 00, almost surely 



1 - 
<5i • Mmin < - y^^i < (1 + ^2) ■ 



n * — ' 
i=i 



A.3 Proof of p3 Proposition 15] 

We state a slight variation on the first part of [3] Proposition 15]. 

Proposition 8 Let {A„ E (0, 1)} be a random process satisfying the following properties. 0) For each n, X n+ \ = 
fi,n(X n ) w.p. j i E [£] independently in the value of X n , where {/i,n(')} , er/i «>n * s a se Q uence of deterministic 
functions. 1) X n converges to X^ £ {0,1} almost surely. 2) There exists a positive constant q such that 
fi,n(x)<q-x di Vx€(p,l),n>0,ie[£\,di>l. 
Then, 

lim Pr (X n < 2- e "") = Pr (X^ = 0) , 

n—tcc \ / 

for each (3 < A, where A = 7 X)i=i log^C^i)- 

Proof We first define a random sequence {£>„ > 1} that takes values from {di\i E [£]} correspondingly to 
properties and 2. This means that this sequence is an i.i.d sequence and X n+ i < q ■ X,f". Furthermore 
A = E [log^ (£>„)] and A = E [D n ]. 

We take the path of [7J by first giving the following definitions. For m < n natural numbers define 

n— 1 

S^n = £ lo&(A) (38) 

i=m 

n-1 

E a ( 39 ) 



J m,n — 

i—m 



Definition 3 For a fixed 7 E [0, 1), let Q m .n{l) arid Q m .n{l) be the events defined by 

Qm,n{l) = \ S mn > 7 • A 

\_n — m 

Gm,n(l) = \ S m . n > 7 • A 

[ n — m 
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Definition 4 Let p, /3 G (0, 1). The events C n (p) and T> n (f3) are defined as 

C n (p) = {X n < p n } (40) 

V n (fJ) = {x n <2- pn * A } (41) 

We prove that the event C n {p) has probability arbitrarily close to Pr(Aoo = 0) as n — > oo. This is used to prove 
that the event T> n {0) has a probability arbitrarily close to Pr(Aoo = 0) as n — > oo. 

Proposition 9 For an arbitrary fixed p G (0, 1) 

lim Pr(C n (p))=Pr(X oc = 0). 

n— >oo 

Proof As mentioned in [7J, the proof is similar to [TJ Theorem 2]. We now elaborate on it. By the law of large 
numbers we have 

lim Pr (£ m ,„(7)) = 1 7€[0,1) (42) 

Now define for a G (0, 1), 

T m {a) = {uj e fl\X. t < a , Vi > m}, 

obviously, 

lim Pr (7^ (a)) > Pr (Aoo = 0) . (43) 



Now, for uj G T m {a) 

< a 



-X»+i(w) _^ ^Dj-i 



therefore 



v- / \ y n TT A i+ i(w) („_ m )(^_5 „ m _i) 

X n (a;) = X m (w) • [[ x < a v 'W-m -™ J 



n ,.._A..-l^'" 



By the law of large numbers (see Remark[T|, for each 7, 6 G (0, 1) there exists no such that for each n > no 

Pi-(QmAl)) = Pr (a^*—- 1 < a^- 1 ) > 1-6/2 (44) 

Also, because of (l43l) . for each 5, a there exists mo, such that, for each m > mo 

Pr {T m {a)) > Pr (Xoo = 0) - 6/2 (45) 

Now, if we take a such that a 7 < p — e, then there exists an n! such that for each n > ni 

This means that for n > max {no, n\} 

Pr (V mo (") f| X » ^ "") - Pr ( Xo ° = 0) - <J. 

Therefore, 

Pr(X„<p n )>Pr(A oo = 0). 

Because < p < 1 it means that (we assume that AToo G {0, 1}) 

Pr (A„ < p n ) < Pr (JC„ < 1) -> Pr (I w = 0) , 

as n — >■ 00 almost surely, 
so, finally 

lim Pr(X n <p n )=Pr(X oo =0). 

71— >QO 
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Remark 1 Given the event T m (a) the sequence {A}j>™ is not necessarily i.i.d anymore. 

However in (|44[) we still want to use the law of large numbers. To do so, take a such that a < q^ 1 . This means 
that if X n < a, then X n+ i < q ■ X® n , which means that X n+ i < a ™^ 1 , so in case D n > 2, we have that 
X n +\ < a. For D n = 1 it may happen that X n+ \ > a. This means that for the sequence D n , 

Pr (D n > 2\T m (a),X n = x) > Pr(A > 2) 

Pr (D n = l\T m (a), X n = x) < Pr (A = 1) 

Construct a sequence D n in the following way. Given X n = x and T m (a), we have 

D n , w.p. 1 - Tr(n,x); 



1, w.p. ir(n,x). 
where 7r(n, x) is chosen such that 

Pr(D n = l]r m (Vi..Y„ =.,-): :Pr(./)„ \ ) 

Note, that in this case 



Pr (D n = i\T m (a),X n = x) = Pr (D n = i) \/i,x 
Which means that D n is an i.i.d sequence that distributes like the sequence D n . We also have that D n < D n , so 

X n ( U ) = X m ( W ) J] %iM < (46) 

i—m 
a n-™( siB ft., m -l) ^ ( a5is E- 6,-1) — * . 

Now, we can use the law of large numbers on the right side of (|46|) . 

Now, we follow the bootstrapping idea that was presented in [S] and used again in [7]. The idea is that for 
some m « n, once a realization of X m becomes sufficiently small, one can assure with probability close to 1, 
that samples conditionally generated on the realization of X m will converge to 0, exponentially fast. We follow 
the steps of [7j Section 4.E]. Define a process {Li} using the process {Xi} as follows for fixed to. 

L l =log 2 X l t = 0,l,2...,tn (47) 

L i+1 = A • U + log 2 (q) i > to. (48) 

C 
The inequality X,- b < 2 Li holds for this sample basis for all i > 0. We have that 

L n = D n -\ ■ L n -i + ( = 

= D n _i • (D n _ 2 ■ L n _ 2 + C) + C = ■ • ■ 

n — 1 n—1 n — 1 

C-S"=m+lII"=j^r \ 

m nr^A J^ (49) 

j^7n+l \r— m / / 

n-l 




JJ A • (£ m + C • (n - to)) = ^ s — (L m + C • (n - to)) . 
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Lemma 3 Fix 7 G (0, 1) and e > and let p be such that \og 2 (p) = — (e + C " m m ) holds. Then, conditional on 
C m (p) fl Sm.n (7) one has from g5|) 

L n < -ff-Mn-m.) . e . m (50) 

Proposition 10 For an arbitrary (3 G (0, A), we have 

lim Pr(P„(/3)) = Pr(X oo =0) 

n— >oo 

Proof Given /3 G (0, A), choose 7, a G (0, 1) such that £ = 7 • (1 — a). Take m = an, where a G (0, 1), and let 
{Li} denote the process defined in (|47|) and (|48[) with respect to to = an. Then for any e > using Lemma 02 
conditional on the event C an (p) f] Qan.n (7) (using p as defined in the Lemma) we have the inequality 

L n < _fT-A-(l-a)n . £ . m = _ £ /3n . £ . ^ 

This means that 

{x n <2- e ""- an }DC an (p)f]g an , n ( 1 ). (51) 



For any n > (ea) _1 , (3 ■ n < 7(1 — a)An + log^(ecm). Therefore, 

2?„(/3)3{x„<2-^ A(1 " a, "- Q "} (52) 

So using (1521) and the independence of C an (p) and Q an ,n (7) we have that 

Pr (D n (/3)) > Pr (£,„,„ (7)) • Pr [C an {p)) . (53) 

Hence, using Proposition [S] we have 

lim Pr(£> n (/3))> 

n— >oo 

lim Pr (£,„,„ (7)) • Pr (C Qn (p)) = Pr^ = 0) 



>-Pr(X oo =0) 



♦ 



1G 



